Recursive alignment block classification technique for word reordering in statistical machine translation

نویسندگان

  • Marta R. Costa-Jussà
  • José A. R. Fonollosa
  • Enric Monte-Moreno
چکیده

Statistical machine translation (SMT) is based on alignment models which learn from bilingual corpora the word correspondences between source and target language. These models are assumed to be capable of learning reorderings. However, the difference in word order between two languages is one of the most important sources of errors in SMT. In this paper, we show that SMT can take advantage of inductive learning in order to solve reordering problems. Given a word alignment, we identify those pairs of consecutive source blocks (sequences of words) whose translation is swapped, i.e. those blocks which, if swapped, generate a correct monotonic translation. Afterwards, we classify these pairs into groups, following recursively a co-occurrence block criterion, in order to infer reorderings. Inside the same group, we allow new internal combination in order to generalize the reorder to unseen pairs of blocks. Then, we identify the pairs of blocks in the source corpora (both training and test) which belong to the same group. We swap them and we use the modified source training corpora to realign and to build the final translation system. We have evaluated our reordering approach both in alignment and translation quality. In addition, we have used two state-of-the-art SMT systems: a Phrased-based and an Ngram-based. Experiments are reported on the EuroParl task, showing improvements almost over 1 point in the standard MT evaluation metrics (mWER and BLEU). M. R. Costa-jussà (&) Barcelona Media Innovation Center, Av. Diagonal 177, 08018 Barcelona, Spain e-mail: [email protected] J. A. R. Fonollosa E. Monte Universitat Politècnica de Catalunya, TALP Research Center, Jordi Girona 1-3, 08034 Barcelona, Spain J. A. R. Fonollosa e-mail: [email protected] E. Monte e-mail: [email protected] 123 Lang Resources & Evaluation (2011) 45:165–179 DOI 10.1007/s10579-010-9133-9

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Reordering in Statistical Machine Translation based on Recursive Alignment Block Classification

Statistical Machine Translation (SMT) is based on alignment models which learn from bilingual corpora the word correspondences between the source and target language. These models are assumed to learn word reorderings. However, the difference in word order between two languages is one of the most important sources of errors in SMT. This paper proposes a Recursive Alignment Block Classification ...

متن کامل

Using Reordering in Statistical Machine Translation based on Alignment Block Classification

Statistical Machine Translation (SMT) is based on alignment models which learn from bilingual corpora the word correspondences between the source and target language. These models are assumed to learn word reorderings. However, the difference in word order between two languages is one of the most important sources of errors in SMT. This paper proposes a Recursive Alignment Block Classification ...

متن کامل

Word Alignment-Based Reordering of Source Chunks in PB-SMT

Reordering poses a big challenge in statistical machine translation between distant language pairs. The paper presents how reordering between distant language pairs can be handled efficiently in phrase-based statistical machine translation. The problem of reordering between distant languages has been approached with prior reordering of the source text at chunk level to simulate the target langu...

متن کامل

Neural Reordering Model Considering Phrase Translation and Word Alignment for Phrase-based Translation

This paper presents an improved lexicalized reordering model for phrase-based statistical machine translation using a deep neural network. Lexicalized reordering suffers from reordering ambiguity, data sparseness and noises in a phrase table. Previous neural reordering model is successful to solve the first and second problems but fails to address the third one. Therefore, we propose new featur...

متن کامل

Iterative reordering and word alignment for statistical MT

Word alignment is necessary for statistical machine translation (SMT), and reordering as a preprocessing step has been shown to improve SMT for many language pairs. In this initial study we investigate if both word alignment and reordering can be improved by iterating these two steps, since they both depend on each other. Overall no consistent improvements were seen on the translation task, but...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Language Resources and Evaluation

دوره 45  شماره 

صفحات  -

تاریخ انتشار 2011